Dynamic Rank/Select Dictionaries with Applications to XML Indexing

نویسندگان

  • Ankur Gupta
  • Wing-Kai Hon
  • Rahul Shah
  • Jeffrey Scott Vitter
چکیده

We consider a central problem in text indexing: Given a text T over an alphabet Σ, construct a compressed data structure answering the queries char (i), rank s(i), and selects(i) for a symbol s ∈ Σ. Many data structures consider these queries for static text T [GGV03, FM01, SG06, GMR06]. We consider the dynamic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T . This problem is a key challenge in compressed text indexing and has direct application to dynamic XML indexing structures that answer subpath queries [FLMM05]. We build on the results of [RRR02, GMR06] and give the best known query bounds for the dynamic version of this problem, supporting arbitrary insertions and deletions of symbols in T . Specifically, with an amortized update time of O((1/ )n ), we suggest how to support rank s(i), selects(i), and char (i) queries in O((1/ ) log log n) time, for any < 1. The best previous query times for this problem were O(log n log |Σ|), given by [MN06]. Our bounds are competitive with state-of-the-art static structures [GMR06]. Some applicable lower bounds for the partial sums problem [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is competitive with the corresponding static structures. For the special case of bitvectors (i.e., |Σ| = 2), we also show the best tradeoffs for query/update time, improving upon the results of [MN06, HSS03, RRR02]. Finally, our focus on fast query/slower update is well-suited for a query-intensive XML indexing environment. Using the XBW transform [FLMM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T . ∗Department of Computer Sciences, Purdue University, West Lafayette, IN 47907–2066, USA ({agupta, wkhon, rahul}@cs.purdue.edu, [email protected]). 1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Dynamizing Succinct Data Structures

We present a framework to dynamize succinct data structures, to encourage their use over non-succinct versions in a wide variety of important application areas. Our framework can dynamize most stateof-the-art succinct data structures for dictionaries, ordinal trees, labeled trees, and text collections. Of particular note is its direct application to XML indexing structures that answer subpath q...

متن کامل

Efficient Dynamic Indexing and Retrieval of XML Documents using Three- Dimensional Quasi-BitCube

XML is a new standard for exchanging and representing data on the Internet. Techniques for indexing and retrieval of XML data is drawing increasing attention since they enable one to access certain parts of retrieved documents easily. However, they provide little or no support for adding new documents to an existing document collection, requiring instead that the entire collection be re-indexed...

متن کامل

Upper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures

The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XM...

متن کامل

Indexing and Querying Semistructured Data Views of Relational Database

The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficien...

متن کامل

بررسی انطباق الزامات ساختاری مجلات علوم پزشکی ایران با معیارهای مورد انتظار Pubmed Central

Introduction :In recent years, there is a growing trend in Iranian medical journals in terms of numbers. In order to be able to be included in international indexing databases, these journals should comply with the required criteria of these databases. So, the aim of this study was to determine the adaptation of Iranian medical journals with the structural criteria of PubMed central journal sel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006